AdaBoost Learning for Detecting and Recognizing Text
نویسندگان
چکیده
This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images). We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containg 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. An commercial OCR software is used to read the text or reject it as a non-text region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.
منابع مشابه
An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
متن کاملAdaBoost Learning for Detecting and Reading Text in City Scenes
This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to ...
متن کاملOn Detecting Spatially Similar and Dissimilar Objects Using Adaboost
AdaBoost has been verified to be proficient in processing images rapidly while attaining high detection rate in face detection. The speed of AdaBoost in face detection is demonstrated in [1], where the detection can be performed in 15 frames per second. The robust speediness and the high accuracy in tracing the target objects have enable AdaBoost to be successful in classification problems. In ...
متن کاملAdaBoost for text detection
Detecting text regions in natural scenes is an important part of computer vision. We propose a novel text detection algorithm that extracts six different classes features of text, and uses Modest AdaBoost with multi-scale sequential search. Experiments show that our algorithm can detect text regions with a f= 0.70, from the ICDAR 2003 datasets which include images with text of various fonts, si...
متن کامل